Skip to content

Group scan time expression rewrite functionality for UDFs in new module in datafusion-physical-expr-adapter#23125

Merged
alamb merged 5 commits into
apache:mainfrom
AdamGS:adamg/udf-specific-rewrites
Jun 23, 2026
Merged

Group scan time expression rewrite functionality for UDFs in new module in datafusion-physical-expr-adapter#23125
alamb merged 5 commits into
apache:mainfrom
AdamGS:adamg/udf-specific-rewrites

Conversation

@AdamGS

@AdamGS AdamGS commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • Closes #.

Rationale for this change

Instead of continuously adding more and more UDF-specific rewrite utilities to datafusion/physical-expr-adapter/src/schema_rewriter.rs, move all of them to a new rewrite module.

This issue was raised in recent PRs that added input_file_name() and file_row_index().

What changes are included in this PR?

  1. Introduce a new rewrite module in datafusion-physical-expr-adapter, and move some functionality to it.

Are these changes tested?

Existing testing suites.

Are there any user-facing changes?

I don't think any of this code was released.

Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
@github-actions github-actions Bot added the datasource Changes to the datasource crate label Jun 23, 2026
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-datasource v54.0.0 (current)
       Built [  46.128s] (current)
     Parsing datafusion-datasource v54.0.0 (current)
      Parsed [   0.037s] (current)
    Building datafusion-datasource v54.0.0 (baseline)
       Built [  39.532s] (baseline)
     Parsing datafusion-datasource v54.0.0 (baseline)
      Parsed [   0.039s] (baseline)
    Checking datafusion-datasource v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.400s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  87.690s] datafusion-datasource
    Building datafusion-datasource-parquet v54.0.0 (current)
       Built [  46.176s] (current)
     Parsing datafusion-datasource-parquet v54.0.0 (current)
      Parsed [   0.033s] (current)
    Building datafusion-datasource-parquet v54.0.0 (baseline)
       Built [  45.953s] (baseline)
     Parsing datafusion-datasource-parquet v54.0.0 (baseline)
      Parsed [   0.034s] (baseline)
    Checking datafusion-datasource-parquet v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.243s] 223 checks: 223 pass, 30 skip
     Summary no semver update required
    Finished [  94.174s] datafusion-datasource-parquet
    Building datafusion-physical-expr-adapter v54.0.0 (current)
       Built [  31.981s] (current)
     Parsing datafusion-physical-expr-adapter v54.0.0 (current)
      Parsed [   0.010s] (current)
    Building datafusion-physical-expr-adapter v54.0.0 (baseline)
       Built [  32.906s] (baseline)
     Parsing datafusion-physical-expr-adapter v54.0.0 (baseline)
      Parsed [   0.011s] (baseline)
    Checking datafusion-physical-expr-adapter v54.0.0 -> v54.0.0 (no change; assume patch)
     Checked [   0.119s] 223 checks: 222 pass, 1 fail, 0 warn, 30 skip

--- failure function_missing: pub fn removed or renamed ---

Description:
A publicly-visible function cannot be imported by its prior path. A `pub use` may have been removed, or the function itself may have been renamed or removed entirely.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#item-remove
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.48.0/src/lints/function_missing.ron

Failed in:
  function datafusion_physical_expr_adapter::schema_rewriter::rewrite_input_file_name_in_projection, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3f4c0ed7f1a37ecdcd04e23cbd6b472de5976d59/datafusion/physical-expr-adapter/src/schema_rewriter.rs:203
  function datafusion_physical_expr_adapter::schema_rewriter::rewrite_file_row_index_projection, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3f4c0ed7f1a37ecdcd04e23cbd6b472de5976d59/datafusion/physical-expr-adapter/src/schema_rewriter.rs:169
  function datafusion_physical_expr_adapter::rewrite_file_row_index_projection, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3f4c0ed7f1a37ecdcd04e23cbd6b472de5976d59/datafusion/physical-expr-adapter/src/schema_rewriter.rs:169
  function datafusion_physical_expr_adapter::schema_rewriter::rewrite_file_row_index_expr, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3f4c0ed7f1a37ecdcd04e23cbd6b472de5976d59/datafusion/physical-expr-adapter/src/schema_rewriter.rs:142
  function datafusion_physical_expr_adapter::rewrite_file_row_index_expr, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3f4c0ed7f1a37ecdcd04e23cbd6b472de5976d59/datafusion/physical-expr-adapter/src/schema_rewriter.rs:142
  function datafusion_physical_expr_adapter::schema_rewriter::expr_references_scalar_udf, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3f4c0ed7f1a37ecdcd04e23cbd6b472de5976d59/datafusion/physical-expr-adapter/src/schema_rewriter.rs:93
  function datafusion_physical_expr_adapter::expr_references_scalar_udf, previously in file /home/runner/work/datafusion/datafusion/target/semver-checks/git-apache_main/3f4c0ed7f1a37ecdcd04e23cbd6b472de5976d59/datafusion/physical-expr-adapter/src/schema_rewriter.rs:93

     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  66.110s] datafusion-physical-expr-adapter

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 23, 2026

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AdamGS -- let me know about the placement in modules

Comment thread datafusion/datasource/src/rewrite.rs Outdated
Comment on lines +18 to +19
//! Rewrites that lower scan-metadata scalar UDFs into concrete physical
//! expressions for the specific file being scanned.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to make this more general "rewrite expressions in preparation for files being scanned, such as scan-metadata scalar UDFs"

Comment thread datafusion/datasource/src/rewrite.rs Outdated
//! Rewrites that lower scan-metadata scalar UDFs into concrete physical
//! expressions for the specific file being scanned.
//!
//! Functions like `file_row_index()` and `input_file_name()` are placeholders

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend making these actual links so the compiler can make sure the do not get stale

//! Functions like [`file_row_index()`] and [`input_file_name()`] are placeholders

Comment thread datafusion/datasource/Cargo.toml Outdated
datafusion-common-runtime = { workspace = true }
datafusion-execution = { workspace = true }
datafusion-expr = { workspace = true }
datafusion-functions = { workspace = true }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are trying very hard to avoid this dependency -- the data sources should not depend on the pre-packaged functions to keep them as general as possible

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe

datafusion-physical-expr-adatper

That would be a better choice

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is that datafusion-physical-expr-adapter already depends on datafusion-functions, and datafusion-datasource depends on it. I was also wondering about this tree, I was considering introducing another crate here but that seems like a big change for something this small.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it there for now, if this pattern and module keep expanding at some point it'll make more sense to maybe rethink some of the crate structure around it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I've also updated the PR's description and title.

@AdamGS AdamGS changed the title Move source-specific UDF rewrite functionality to datafusion-datasource Group scan time expression rewrite functionality for UDFs in new module in datafusion-physical-expr-adapter Jun 23, 2026
@AdamGS AdamGS force-pushed the adamg/udf-specific-rewrites branch from fafcff4 to a015ecd Compare June 23, 2026 13:56

@alamb alamb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@alamb alamb added this pull request to the merge queue Jun 23, 2026
Merged via the queue into apache:main with commit 42cfd8a Jun 23, 2026
38 checks passed
@AdamGS AdamGS deleted the adamg/udf-specific-rewrites branch June 23, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants